Introduction to Freva
Martin Bergemann, Bijan Fallah, Andrej Fast, Mostafa Hadizadeh, Christopher Kadow, Etor E. Lucio-Eceiza, Felix Oertel, Manuel Reis, and many others…
Deutsches Klimarechenzentrum @ CLINT in Data Analysis Dpt. (+ Data Management Dpt.*)
Common Problem I: Finding Data¶
Common Problem II: Using code of others¶
Common Problem III: Reproducing your results¶
- How can we search and access various datasets efficiently?
- How can we streamline user data analysis tools (reusable and reproducible)?
Yet another solution: The Freva framework
- Flexibility
- Standardisation
- Centralisation
- Transparency
Let's get an overview...¶
Flexible access
Freva is a (mainly) Python3 framework
Running at DKRZ's HPC, it comes in 3 flavours
- Command Line Interface (cli)
- Web User Interface
- Python module
Each interface offers similar and interconnected features
Standardized data
- CMOR, maps FROM/TO several ESGF standards (CMIP6, CORDEX, pseudoCMIP5, nextgems flavours
- Data ingested via SOLR Apache
- Millions of files available (>10 million)
- Intuitive queries & Fast results
- Metadata previsualisation
- Time selection
- Generates reproducible freva command & URL
- Indexation of POSIX files, tape archive and intake catalogs & multiple formats (netcdf, zarr…)
Easy incorporation of tools
- Flexible programming language: NO specific language (only free software!)
- Standardised API in python 3: no need to know all the code environments
- Tool: ANY language (python, R, C, FORTRAN, a mix...)
./movie_plotter.sh /path/2/INPUT /path/2/OUTPUT
- Plugin API (Wrapper): in python
from evaluation_system.api import plugin from evaluation_system.api.parameters import (ParameterDictionary as ParamDict, String) class MoviePlotter(plugin.PluginAbstract): __short_description__ = "Plots 2D lon/lat movies in GIF format" __version__ = (0,0,1) __parameters__ = ParamDict( String(name='input', default=None, mandatory=True, help='File to plot'), String(name='outputdir', default=None, mandatory=True, help='default output dir') ) def run_tool(self, config_dict=None): input = config_dict['input'] outputdir = config_dict['outputdir'] self.call(f'{self.class_basedir}/movie_plotter.sh {input} {outputdir}') return self.prepare_output(config_dict['outputdir'])
- Freva command:
freva plugin MoviePlotter input=/foo/bar.nc outputdir=/path/2/OUTPUT
Transparency & reproducibility
- Every config stored in a (MariaDB) database
- Every config is searchable
- Every config can be modified & re-run
- History stores plugin & Freva system Git versioning!
- Saves CPU hours, I/O and storage!
Additionally:
- Results can be shared and commented
- Config compared with previous similar runs
... And there is more¶
A development environment
One can test a plugin without interference:
- It exists? locally overwrite
- Is new? locally plug in
Similar behaviour with the cli:
- To plug:
export EVALUATION_SYSTEM_PLUGINS
environment variable - multiple local plugs allowed
Plugin → Database → Plugin
1. Special Freva function in the plugin wrapper to add outputs in databrowser:
if config_dict["link2database"] is True:
self.add_output_to_databrowser(outputdir, project=config_dict["project"], product=config_dict["product"],)
2. Result now is part of the users database:
3. Ready as input for new plugin:
How to use Freva?
Freva framework:
- Extensive documentation, updated regularly
- Accessible from every Freva web interface
Freva plugins:
- Plugins have short descriptions
- Setups have detailed configuration info
- Many plugins have documentation pages
What's next?
- Freva RestAPI: allows connection with solr to search for data in many languages (working!)
- Freva Client: freva library for data search, for python and cli (working!)
- Data streaming: allows to stream data in zarr from anywhere (filesystem, cloud, tape archive) (WIP)
- Freva Futures: registering a dataset that will exist in the future (WIP)
- Freva workflows: an efficient way to connect plugins (e.g. via CWL, concept)
- ...
Thanks for your attention!
Contact: freva@dkrz.de
Documentation: https://freva-clint.github.io/freva/
Workshop GitLab Repository: https://gitlab.dkrz.de/freva/freva_workshop